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In the common linear regression model the problem of deter- 
mining optimal designs for least squares estimation is considered in 
the case where the observations are correlated. A necessary condi- 
tion for the optimality of a given design is provided, which extends 
the classical equivalence theory for optimal designs in models with 
uncorrelated errors to the case of dependent data. If the regression 
functions are eigenfunctions of an integral operator defined by the 
covariance kernel, it is shown that the corresponding measure defines 
a universally optimal design. For several models universally optimal 
designs can be identified explicitly. In particular, it is proved that the 
uniform distribution is universally optimal for a class of trigonometric 
regression models with a broad class of covariance kernels and that 
the arcsine distribution is universally optimal for the polynomial re- 
gression model with correlation structure defined by the logarithmic 
potential. To the best knowledge of the authors these findings pro- 
vide the first explicit results on optimal designs for regression models 
with correlated observations, which are not restricted to the location 
scale model. 

1. Introduction. Consider the common linear regression model 

(1.1) y(x) = 0i/i(aO + • • • + e m f m (x) + e(x), 

where fi(x),...,f m (x) are linearly independent, continuous functions, e(x) 
denotes a random error process or field, 9\, . . . ,8 m are unknown parameters 
and x is the explanatory variable, which varies in a compact design space 
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X C Mr. We assume that N observations, say yi, ■ ■ ■ , yjv> can be taken at 
experimental conditions x±, . . .,xjv to estimate the parameters in the linear 
regression model (1.1). If an appropriate estimate, say 9, of the parameter 
9 = (0i, ... , 6 m ) T has been chosen, the quality of the statistical analysis can 
be further improved by choosing an appropriate design for the experiment. 
In particular, an optimal design minimizes a functional of the variance- 
covariance matrix of the estimate 9, where the functional should reflect 
certain aspects of the goal of the experiment. In contrast to the case of un- 
corrected errors, where numerous results and a rather complete theory are 
available [see, e.g., the monograph of Pukelsheim (2006)], the construction 
of optimal designs for dependent observations is intrinsically more difficult. 
On the other hand, this problem is of particular practical interest as in most 
applications there exists correlation between different observations. Typical 
examples include models, where the explanatory variable x represents the 
time and all observations correspond to one subject. In such situations op- 
timal experimental designs are very difficult to find even in simple cases. 
Some exact optimal design problems were considered in Boltze and Nather 
(1982), Nather [(1985a), Chapter 4], Nather (1985b), Pazman and Miiller 
(2001) and Miiller and Pazman (2003), who derived optimal designs for the 
location scale model 

(1.2) y(x) = + e(x). 

Exact optimal designs for specific linear models have been investigated in 
Dette, Kunert and Pepelyshev (2008), Kiselak and Stehlfk (2008), Harman 
and Stulajter (2010). Because explicit solutions of optimal design problems 
for correlated observations are rarely available, several authors have pro- 
posed to determine optimal designs based on asymptotic arguments [see, 
e.g., Sacks and Ylvisaker (1966, 1968), Bickel and Herzberg (1979), Nather 
(1985a), Zhigljavsky, Dette and Pepelyshev (2010)]. Roughly speaking, there 
exist three approaches to embed the optimal design problem for regres- 
sion models with correlated observations in an asymptotic optimal design 
problem. The first one is due to Sacks and Ylvisaker (1966, 1968), who as- 
sumed that the covariance structure of the error process e(x) is fixed and 
that the number of design points tends to infinity. Alternatively, Bickel and 
Herzberg (1979) and Bickel, Herzberg and Schilling (1981) considered a dif- 
ferent model, where the correlation function depends on the sample size. 
Recently, Zhigljavsky, Dette and Pepelyshev (2010) extended the Bickel- 
Herzberg approach and allowed the variance (in addition to the correlation 
function) to vary as the number of observations changes. As a result, the cor- 
responding optimality criteria contain a kernel with singularity at zero. The 
focus in all these papers is again mainly on the location scale model (1.2). 

The difficulties in the development of the optimal design theory for corre- 
lated observations can be explained by a different structure of the covariance 
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of the least squares estimator in model (1.1), which is of the form M~ 1 BM~ 1 
for certain matrices M and B depending on the design. As a consequence, 
the corresponding design problems are in general not convex [except for the 
location scale model (1.2) where M = 1]. 

The present paper is devoted to the problem of determining optimal de- 
signs for more general models with correlated observations than the simple 
location scale model (1.2). In Section 2 we present some preliminary dis- 
cussion and introduce the necessary notation. In Section 3 we investigate 
general conditions for design optimality. One of the main results of the pa- 
per is Theorem 3.3, where we derive necessary and sufficient conditions for 
the universal optimality of designs. By relating the optimal design prob- 
lems to eigenvalue problems for integral operators we identify a broad class 
of multi-parameter regression models where the universally optimal designs 
can be determined explicitly. It is also shown that in this case the least 
squares estimate with the corresponding optimal design has the same covari- 
ance matrix as the weighted least squares estimates with its optimal design. 
In other words, under the conditions of Theorem 3.3 least squares estima- 
tion combined with an optimal design can never be improved by weighted 
least squares estimation. In Section 4 several applications are presented. In 
particular, we show that for a trigonometric system of regression functions 
involving only cosinus terms with an arbitrary periodic covariance kernel, 
the uniform distribution is universally optimal. We also prove that the arc- 
sine design is universally optimal for the polynomial regression model with 
the logarithmic covariance kernel and derive some universal optimality prop- 
erties of the Beta distribution. To our best knowledge these results provide 
the first explicit solution of optimal design problems for regression models 
with correlated observations which differ from the location scale model. 

In Section 5 we provide an algorithm for computing optimal designs for 
any regression model with specified covariance function and investigate the 
efficiency of the arcsine and uniform distribution in polynomial regression 
models with exponential correlation functions. Finally, Section 6 contains 
some conclusions and technical details are given in the Appendix. 

2. Preliminaries. 

2.1. The asymptotic covariance matrix. Consider the linear regression 
model (1.1), where e(x) is a stochastic process with 

(2.1) Ee(x) = 0, Ee(x)e(x') = K(x,x'); x,x'£XcR d ; 

the function K(x,x') is called covariance kernel. If TV" observations, say y = 
(yi, . . . , 2/at) T , are available at experimental conditions x\,...,xn and the 
covariance kernel is known, the vector of parameters can be estimated by the 
weighted least squares method, that is, 9 = (X T T,- 1 X.y 1 X T 'S- 1 y, where 
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X (/i(a?j))i=i '"'tv an( ^ ^ = (K( x i-> x j))i,j=l,...,N- The variance-covariance 
matrix of this estimate is given by 

Var(0) = (X T S- 1 X)~ 1 . 

If the correlation structure of the process is not known, one usually uses the 
ordinary least squares estimate 6 = (X T X)~ 1 X r y, which has the covariance 
matrix 

(2.2) Var(0) = (X T X) -1 X T EX(X T X) -1 . 

An exact experimental design £tv = {x\, ■ ■ ■ , x^} is a collection of N points 
in X, which defines the time points or experimental conditions where ob- 
servations are taken. Optimal designs for weighted or ordinary least squares 
estimation minimize a functional of the covariance matrix of the weighted or 
ordinary least squares estimate, respectively, and numerous optimality crite- 
ria have been proposed in the literature to discriminate between competing 
designs; see Pukelsheim (2006). 

Note that the weighted least squares estimate can only be used if the 
correlation structure of the errors is known, and its misspecification can 
lead to a considerable loss of efficiency. At the same time, the ordinary least 
squares estimate does not employ the structure of the correlation. Obviously 
the ordinary least squares estimate can be less efficient than the weighted 
least squares estimate, but in many cases the loss of efficiency is small. For 
example, consider the location scale model (1.2) with a stationary error 
process, the Gaussian correlation function p(t) = e~ xt and the exact design 
£ = {—1, —2/3, —1/3, 1/3, 2/3, 1}. Suppose that the guessed value of A equals 
1 while the true value is 2. Then the variance of the weighted least squares 
estimate is 0.528 computed as 

( XT ^guess X ) lxTS guess^trueS gu 1 ess X(X T I] gu 1 ess X) , 

while the variance of the ordinary least squares estimate is 0.433. If the 
guessed value of A equals the true value, then the variance of the weighted 
least squares estimate is 0.382. A similar relation between the variances 
holds if the location scale model and the Gaussian correlation function are 
replaced by a polynomial model and a triangular or exponential correlation 
function, respectively. For a more detailed discussion concerning advantages 
of the ordinary least squares against the weighted least squares estimate, see 
Bickel and Herzberg (1979) and Section 5.1 in Nather (1985a). 

Throughout this article we will concentrate on optimal designs for the 
ordinary least squares estimate. These designs also require the specification 
of the correlation structure but a potential loss by its misspecification in the 
stage of design construction is typically much smaller than the loss caused by 
the misspecification of the correlation structure in the weighted least squares 
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estimate. Moreover, in this paper we will demonstrate that there are many 
situations, where the combination of the ordinary least squares estimate with 
the corresponding (universally) optimal design yields the same covariance 
matrix as the weighted least squares estimate on the basis of a (universally) 
optimal design for weighted least squares estimation; see the discussions in 
Sections 4 and 6. 

Because even in simple models the exact optimal designs are difficult to 
find, most authors usually use asymptotic arguments to determine efficient 
designs for the estimation of the model parameters; see Sacks and Ylvisaker 
(1966, 1968), Bickel and Herzberg (1979) or Zhigljavsky, Dette and Pepely- 
shev (2010). Sacks and Ylvisaker (1966, 1968) and Nather [(1985a), Chapter 
4], assumed that the design points {x±, . . . ,xjy} are generated by the quan- 
tiles of a distribution function, that is, 

(2.3) Xi = a((i-1)/(N-1)), i = l,...,N, 

where the function a : [0, 1] — > X is the inverse of a distribution function. If 
£jv denotes a design with N points and corresponding quantile function a(-), 
the covariance matrix of the least squares estimate 9 = 0£ N given in (2.2) 
can be written as 

(2.4) Vax(6) = D(Z N ) = M- 1 (Z N )B(t N ,Z N )M- 1 {Z N ), 
where 

(2.5) M(6v) = / f(u)f T (u)t N (du), 

j x 

(2.6) B(Z N ,Z N ) = J j K(u,v)f(u)f T (v^ N (du^ N (dv), 

and f(u) = (fi(u), . . . , f m {u)) T denotes the vector of regression functions. 
Following Kiefer (1974) we call any probability measure £ on X (more pre- 
cisely on an appropriate Borel field) an approximate design or simply design. 
The definition of the matrices M(£) and -£>(£, £) can be extended to an arbi- 
trary design £, provided that the corresponding integrals exist. The matrix 

(2.7) D(0 = M~\0B(t,0M-H0 

is called the covariance matrix for the design £ and can be defined for any 
probability measure £ supported on the design space X such that the ma- 
trices -B(£,£) and M -1 (£) are well defined. This set will be denoted by S. 
An (approximate) optimal design minimizes a functional of the covariance 
matrix -D(£) over the set H and a universally optimal design £* (if it exists) 
minimizes the matrix £ with respect to the Loewner ordering, that is, 

D{C) < £>(£) for all £ G E. 
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Note that on the basis of this asymptotic analysis the kernel K(u, v) has to 
be well defined for all u, v E X. On the other hand, Zhigljavsky, Dette and 
Pepelyshev (2010) extended the approach in Bickel and Herzberg (1979) and 
proposed an alternative approximation for the covariance matrix in (2.2), 
where the variance of the observations also depends on the sample size. As 
a result they obtained an approximating matrix of the form (2.6), where the 
kernel K(u,v) in the matrix may have singularities at the diagonal. 

Note that in general the function D(£) is not convex (with respect to the 
Loewner ordering) on the space of all approximate designs. This implies that 
even if one determines optimal designs by minimizing a convex functional, 
say <3?, of the matrix D(£), the corresponding functional £ — > <&(!)(£)) is 
generally not convex on the space of designs H. Consider, for example, the 
case m = l where -D(£) is given by 

-2 



(2.8) D(0 



K(u,v)f(u)f(v)Z(du)£(dv), 



and it is obvious that this functional is not necessarily convex. On the other 
hand, for the location scale model (1.2) we have m = 1, f{x) = 1 for all x and 
this expression reduces to D(£) = Jj K(u,v)£(du)£(dv). In the stationary 
case K(u, v) = a 2 p(u — v), where p(-) is a correlation function, this functional 
is convex on the set of all probability measures on the domain X ; see Lemma 
1 in Zhigljavsky, Dette and Pepelyshev (2010) and Lemma 4.3 in Nather 
(1985a). For this reason [namely the convexity of the functional D (£)] most 
of the literature discussing asymptotic optimal design problems for least 
squares estimation in the presence of correlated observations considers the 
location scale model, which corresponds to the estimation of the mean of 
a stationary process; see, for example, Boltze and Nather (1982), Nather 
(1985a, 1985b). 

2.2. Covariance kernels. Consider the covariance kernels K(u,v) that 
appeared in (2.1). An important case appears when the error process is 
stationary and the covariance kernel is of the form K(u,v) = a 2 p(u — v), 
where p(0) = 1 and p(-) is called the correlation function. 

Because in this paper we are interested in designs maximizing functionals 
of the matrix -D(£) independently of the type of approximation which has 
been used to derive it, we will also consider singular kernels in the follow- 
ing discussion. Moreover, we call K(u,v) covariance kernel even if it has 
singularities at the diagonal. 

The covariance kernels with singularities at the diagonal can be used as 
approximations to the standard covariance kernels. They naturally appear as 
limits of sequences of covariance kernels satisfying K^(u, v) = (J 2 n pn{u — v), 
where pjv(t) = p{a^t), a 2 ^ = a^r 2 , r > 0, < a < 1, is a constant depending 
on the asymptotic behavior of the function p(t) as t — > oo, and {a^j-Arg^ 
denotes a sequence of positive numbers satisfying — > oo as N — > oo. Con- 
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sider, for example, the correlation function pit) = 1/(1 + \t\) a which is non- 
singular. Then the sequence of functions 

a% PN (t) = «&r 2 (1 + | ^ t|)a = r 2 ^^^ 

converges to r a {t) = l/|i| a as N — > oo. For slightly different types of approx- 
imation, see Examples 4.2 and 4.3 below. 

Let us summarize the assumptions regarding the covariance kernel. First, 
we assume that K is symmetric and continuous at all points (u, v) G X x X 
except possibly at the diagonal points (u, u). We also assume that K(u, v) ^ 
for at least one pair (u,v) with u^v. Any covariance kernel K(-, •) consid- 
ered in this paper is assumed to be positive definite in the following sense: 
for any signed measure u(du) on X, we have 

(2.9) J J K{u,v)v{&u)v{&v)>0. 

If the kernel K(u,v) has singularities at the diagonal then the assumptions 
we make are as follows. We assume that K(u,v) = r(u — v), where r(-) is a 
function on R \ {0} with < r(t) < oo for all t ^ and r(0) = +oo. We also 
assume that there exists a monotonously increasing sequence {<7^pAr(£)}jveN 
of covariance functions such that < a^p^{t) < r(t) for all t and all = 
1,2,... and r(t) = lim7v->oo <^nP^(^)- Theorem 5 in Zhigljavsky, Dette and 
Pepelyshev (2010) then guarantees that for this kernel we also have the 
property of positive definiteness (2.9). 

2.3. The set of admissible design points. Consider the vector-function 
f(x) = (fi(x), . . . , fm{x)) T used in the definition of the regression model 
(1.1). Define the sets X = {x E X : f(x) = 0} and X 1 = X \ X = {x G X : 
f(x) ^ 0} and assume that designs £o and £i are concentrated on X$ and X\ 
correspondingly. Consider the design £ Q = a£o + (1 — c>0£i with < a < 1; 
note that if the design £ a is concentrated on the set Xq only (corresponding 
to the case a = 1), then the construction of estimates is not possible. We 
have 

M(U = [ f(x)f T (x)Udx) = (l-a)M(Ci), M~\U) = j— — ^~ 1 (Ci) 
J I- a 

and 

B(Za,Za) = J J K(x,u)f(x)f T (u)U<ix)Udu) = (l-a) 2 B^ 1 ,Ci). 
Therefore, for all < a < 1 we have 

z?(e a ) = M- l (^)s(^,e a )M- l (e a ) = M- l (6) J B(ei,ei)M- 1 (ei) = J D(6)- 

Consequently, observations taken at points from the set Xq do not change 
the estimate 6 and its covariance matrix. If we use the convention • oo = 0, 
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it follows that ff K(x,u)f(x)f T (u)^o(dx)^o(du) = 0, and this statement is 
also true for the covariance kernels K(x,u) with singularity at x = u. 

Summarizing this discussion, we assume throughout this paper that /(x)^0 
for all x £ X . 

3. Characterizations of optimal designs. 

3.1. General optimality criteria. Recall the definition of the information 
matrix in (2.5) and define 



where £ and are two arbitrary designs, and K(u, v) is a covariance 

kernel. 

According to the discussion in the previous paragraph, the asymptotic 
covariance matrix of the least squares estimator 6 is proportional to the 
matrix -D(£) defined in (2.4). Let $(•) be a monotone, real valued functional 
defined on the space of symmetric m x m matrices where the monotonicity 
of $(•) means that A > B implies $(A) > &{B). Then the optimal design 
minimizes the function 



on the space S of all designs. In addition to monotonicity, we shall also 
assume differentiability of the functional $(•); that is, the existence of the 
matrix of derivatives 



where D is any symmetric nonnegative definite matrix of size m x m. The 
following lemma is crucial in the proof of the optimality theorem below. 

Lemma 3.1. Let £ and v be two designs and & be a differentiable func- 
tional. Set £ a = (1 — a)!; + av and assume that the matrices M '(£) and JB(£,£) 
are nonsingular. Then the directional derivative of $ at the design £ in the 
direction of v — £ is given by 




(3.1) 




da 



a=0 



where 



¥>(!/, =tr(M(i/)Z>(0C(0M- 1 (0), 



and 



c(0 



d<S>{D) 
dD 



D=D(0 
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Proof. Straightforward calculation shows that 
d 



da 



M- l a c 



M _1 (£) - M- l {i)M(v)M~ l {i) 



a=0 



and 



d_ 

da 



a=0 



Using the formula for the derivative of a product and the two formulas above, 
we obtain 

d 



da 



Q = 



Note that the matrices M(£ a ) and B(£ a ,£ a ) are nonsingular for small non- 
negative a (i.e., for all a S [0, ao) where ao is a small positive number) which 
follows from the nondegeneracy of M(£) and £?(£,£) and the continuity of 
M(£ Q ) and B(^ a ,^ a ) with respect to a. 

Using the above formula and the fact that tr(H(A + A T )) = 2tr(HA) for 
any m x m matrix A and any m x m symmetric matrix H, we obtain 



da 



a=0 



2[b(i/,fl -*>(",«)]• 



a=0 



□ 



Note that the functions h(u, £) and <p{y,£,) can be represented as 



b(i/,0= / b(x,£Mdx) 



where 

(3.2) <^,6 = = / T (x)L'(0C(0M- 1 (e)/(x), 

(3.3) 6(x,o = b(e„e) = tr(c(OM- i (e) J B(e,e,)M- l (o), 

and ^ is the probability measure concentrated at a point x. 

Lemma 3.2. For any design £ suc/i i/iai i/ie matrices M(£) and 
are nonsingular we have 



(3.4) 



where the functions ip(x,£) and b(x,£) are defined in (3.2) and (3.3), re- 
spectively. 



Proof. Straightforward calculation shows that 

<p(x,0^dx)=tr(D(0C(0M- 1 (0 f f(x)f T (x)^dx))=tr(D(OC(0). 
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We also have 

cT 



e(dx) 



K(u,x)f(u)f T (x)i(du) 



£(dx) = £?(£,£), 



which implies 



J b(x,0C(dx) = tr(^M- 1 (0C(0M-\0 j B^^dx 

= tr(D(0C(t)). □ 

The first main result of this section provides a necessary condition for the 
optimality of a given design. 

Theorem 3.1. Let be any design minimizing the functional <3?(-D(£)). 
Then the inequality 

(3.5) <p{ X ,e)<b(x,e) 

holds for all x E X, where the functions ip(x,£) and b(x,£) are defined in 
(3.2) and (3.3), respectively. Moreover, there is equality in (3.5) for £*- 
almost all x, that is, £*(-4) = where 

A = A(C) = {x G X | (p(x, D < b{x, C)} 
is the set of x G X such that the inequality (3.5) is strict. 

Proof. Consider any design £* minimizing the functional The 
necessary condition for an element to be a minimizer of a differentiable 
functional states that the directional derivative from this element in any 
direction is nonnegative. In the case of the design and the functional 
<5 (!?(£)) this yields for any design v 



da 



>0, 

a=0 



where £ Q = (1 — a)^* + au . Inequality (3.5) follows now from Lemma 1. The 
assumption that inequality (3.5) is strict for all x G A with (,*(A) > is in 
contradiction with identity (3.4). □ 



Remark 3.1. In the classical theory of optimal design, convex optimal- 
ity criteria are almost always considered. However, in at least one paper, 
namely Torsney (1986), an optimality theorem for a rather general noncon- 
vex optimality criteria was established and used (in the case of noncorrelated 
observations) . 
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3.2. An alternative representation of the necessary condition of optimal- 
ity. For a given design £ £ S, introduce the vector- valued function 



(3.6) = J K(x, u)/(tO£(du) - A/(x), i^, 
where A = £>(<!;, £)M _1 (£). This function satisfies the equality 

(3.7) / g(x)f T (x)adx) = 0. 



Additionally, as the vector of regression functions /(•) is continuous on X, 
the function g(-) is continuous too. 
Note that /i, • • . ,/ m € Z^*^ >£) where 

L 2 (X,£) = ^h:X->R J h 2 (x)£(dx) <ooJ. 

Formula (3.6) implies that g(x) is the residual obtained after component- 
wise projection of the vector- valued function J K(x,u)f(u)£(du) onto the 
subspace span{/i, . . . , f m } C L 2 (X,£). 

Using (3.6), (3.7) and the symmetry of the matrix £>(£,£) we obtain 

B(Z,® = J J K(x,u)f(u)Z(du)f T (x)Z(ax) 

Af(x)f T (x)£(dx) + / g(x)f T (x)Z(dx) 



= AM(£) = M(£)A T , 
which gives for the matrix D in (2.7), 

£>(£) = M-^OB^OM- 1 ^) = M- x (e)A = A T M^(£). 
For the function (3.2), we obtain 

<p(x,£) = f T (x)D(0C(0M- 1 (0f(^) 

= f T (x)A T M- 1 (0C(0M- 1 (0m 
= f T (x)M- 1 (C)C(^r 1 (CW(x). 

We also have 

B(e,^) = | A:(x,n)/(^(dn)/ T (x) = A/(x)/ T (x)+ 5 (x)/ T (x), 

which gives for function (3.3) 

6(x,e) = tr(C(£)M- 1 (e)i?(£,£ :l .)M- 1 (£)) 
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where the function r is defined by 

r{x,i) = f T {x)M-\OC{OM-\i)g{x). 
The following result is now an obvious corollary of Theorem 3.1. 

Corollary 3.1. If a design £ is optimal, then r(x,£) > for all x £ X 
and r{x,^) = for all x in the support of the measure £. 

3.3. D-optimality. For the D-optimality there exists an analogue of the 
celebrated "equivalence theorem" of Kiefer and Wolfowitz (1960), which 
characterizes optimal designs minimizing the D-optimality criterion 
*(£>(£)) = lndet(Z>(£)). 

Theorem 3.2. Let £* be any D- optimal design. Then for all x £ X we 
have 

(3.8) d(x,C)<b(x,C), 

where the functions d and b are defined by d(x,£) = f T (x)M~ l (^)f(x) and 
b(x,0 = tr(B- 1 (tOB(tC x )) 

(3-9) 

= f T (x)B- 1 (Z,0 / K(u,x)f(u)adu), 



respectively. Moreover, there is equality in (3.8) for £* -almost all x. 

PROOF. In the case of the D-optimality criterion $(£>(£)) = lndet(.D(£)), 
we have C(£) = D~ 1 (£), which gives 

ip(x,S) = f T (x)D(0D-\0M- 1 (0f^) = d(x,0- 
Similarly, we simplify an expression for b(x,£). Reference to Theorem 3.1 
completes the proof. □ 

Note that the function r(x,£) for the /^-criterion is given by 

r{x,0 = f T (.x)B~\U)g{x) 

and, consequently, the necessary condition of the D-optimality can be writ- 
ten as f T {x)B~ 1 {^^)g{x) > for all x G X. 

The following statement illustrates a remarkable similarity between D- 
optimal design problems in the cases of correlated and noncorrelated obser- 
vations. The proof easily follows from Lemma 3.2 and Theorem 3.2. 

Corollary 3.2. For any design £ such that the matrices M(£) and 
B '(£>£) are nonsingular we have 



J d(x,£)£(dx) = J b(x,£)S(dx)=m, 
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Fig. 1. The functions b(x,£) and d(x,£) for the regression model (1-1) 
with f(x) = (l,a5,a5 2 ) T and the covariance kernels K(u,v) = e - ' 11- "' (left), 
K(u,v) — max(0, 1 — |u — v\) (middle) and K(u,v) — — log(u — v) 2 (right) and the 
arcsme design f Q . 

where b(x,£) is defined in (3.9) and m is the number of parameters in the 
regression model (1.1). 

Example 3.1. Consider the quadratic regression model y{x) = 9\ + 
+ O^x 2 + e{x) with design space X = [—1, 1]. In Figure 1 we plot functions 
b(x,£) and d(x,£) for the covariance kernels K(u,v) = e~\ u ~ l \ K(u,v) = 
max{0, 1 — \u — v\} and K(u,v) = — log(ti — v) 2 , where the design is the 
arcsine distribution with density 

(3.10) p(x) = l/{irVl-x 2 ), s€(-l,l). 

Throughout this paper this design will be called "arcsine design" and de- 
noted by £ a . By the definition, the function d(x,£) is the same for differ- 
ent covariance kernels, but the function b(x,£) depends on the choice of 
the kernel. From the left and middle panel we see that the arcsine design 
does not satisfy the necessary condition of Theorem 3.1 for the kernels 
K(u,v) = e - !" - "! and max{0, 1 — \u — v\} and is therefore not D-optimal 
for the quadratic regression model. On the other hand, for the logarithmic 
kernel K(u,v) = — log(u — v) 2 the necessary condition is satisfied, and the 
arcsine design £ a is a candidate for the D-optimal design. We will show in 
Theorem 4.5 that the design £ a is universally optimal and as a consequence 
optimal with respect to a broad class of criteria including the D-optimality 
criterion. 

3.4. c-optimality. For the c-optimality criterion $(D(£)) = c T D(£)c, we 
have C(£) = cc T . Consequently, 

<p(x,£) = f T (x)M-\0cc T M-\0Af(x) =c T M- 1 (e)A/(x)/ T (x)M- 1 (Oc 
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Fig. 2. The c-optimal design for the quadratic model and the triangular correlation func- 
tion, where c — (1, 0, 0) T . 

and 

r(x,0 = b(x,0 - <p(x,£) = f T {x)M- l {i)cc T M~\i)g{x). 
Therefore, the necessary condition for c-optimality simplifies to 
(3.11) f T {x)M~ 1 ^)cc T Ar 1 (^)g(x)>0 for all x eX. 

Example 3.2. Consider again the quadratic regression model y{x) = 
9\ + 62X + 63X 2 + e{x) with design space X = [— 1, 1], Assume the triangular 
correlation function p(x) = max{0, 1 — |x|}. 

Let £ = { — 1, 0, 1; 1/3, 1/3, 1/3} be the design assigning weights 1/3 to the 
points —1,0 and 1. For this design, we have the matrices M(£) and D(£) 

I 1 2/3 \ ( I -l\ 

M(0 = 2/3 , D(0 = 1/2 , 
\2/3 2/3/ \-l 3/2/ 

and the matrix A and the vector g are given by 

A = diag(l/3, 1/3, 1/3), g(x) = (1/3, x/3, |x|/3) T . 

If c = (0, 1, 0) T , then r(x,£) = for all x E [—1, 1] and thus the design £ 
satisfies the necessary condition for c-optimality in (3.11). If c = (1,0, 1) T , 
then r(x,£) = f|£| 3 (l — |x|) > for all x € [—1,1] and the design £ also 
satisfies (3.11). The corresponding functions b and tp are displayed in the 
left and middle panels of Figure 3. Numerical analysis shows that for both 
vectors this design is in fact c-optimal. However, it is not optimal for any 
c-optimality criteria. For example, if c = (1,0, 0) T , then r(x,£) = —3x(l — 
\x\)(l — x 2 ) < for all x G [—1,1], showing that the design is not c-optimal; 
see the middle panel of Figure 3. For this case, the density function of the 
c-optimal design is displayed in Figure 2. The corresponding functions b and 
(p are shown in the right panel of Figure 3. 
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(a) (b) (c) 

Fig. 3. The functions b(x,(,) and (j>(x,£) for the c-optimality criterion, (a): c= (1,0, 1) T , 
design £ = {-1, 0, 1; 1/3, 1/3, 1/3}; (b): c = (1, 0, 0) T , design f = {-1, 0, 1; 1/3, 1/3, 1/3}; 
(c): c= (1,0, 0) T , design is displayed in Figure 2. 

3.5. Universal optimality. In this section we consider the matrix D(£) 
defined in (2.7) as the matrix optimality criterion which we are going to 
minimize on the set 3 of all designs, such that the matrices !?(£,£) and 
M -1 (£) [and therefore the matrix D(£)] are well defined. Recall that a design 
£* is universally optimal if D(£*) < D(£) in the sense of the Loewner ordering 
for any design £ G 3. Note that a design is universally optimal if and only 
if is c-optimal for any vector c G M' m \ {0}; that is, c T D(^*)c < c T D(£)c 
for any £ G 3 and any c G M' m . 

Theorem 3.3. Consider the regression model (1.1) with a covariance 
kernel K, a design £ G 3 and the corresponding the vector-function g{-) de- 
fined in (3.6). 

(a) If g{x) = for all x G X , then the design £ is universally optimal; 

(b) If the design £ is universally optimal, then the function g(-) can be 
represented in the form g(x) = r y(x)f(x), where y(x) is a nonnegative func- 
tion defined on X such that j(x) = for all x in the support of the design £. 

In the proof of Theorem 3.3 we shall need the following two auxiliary 
results which will be proved in the Appendix. 

Lemma 3.3. Let c G M m , and M be the set of all signed vector measures 
supported on X . Then the functional $ c : Ai — > M + defined by 

(3.12) $ C ( /U ) = C T J J K(x,u)fi(dx)fj T (du)c 

is convex. 

Lemma 3.4. Let m > 1 and a,b G M m be two linearly independent vec- 
tors. Then there exists a vector c G R m such that S c = c T ab T c < 0. 
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Proof of Theorem 3.3. Consider the regression model y(x) = f T (x)6 + 
e(x), where the full trajectory {y(x)\x S X} can be observed. Let #(//) = 
J y(x)/j>(dx) be a general linear unbiased estimate of the parameter 9, where 
fi = (fii, . . . , fi rn ) T is a vector of signed measures. For example, the least 
squares estimate for a design £ in this model is obtained as O(fx^), where 
fi^(dx) = M~ 1 (£)/(x)£(dx). The condition of unbiasedness of the estimate 
9(fi) means that 



= E[0(ji)]=E 



H(dx)y(x) 



fi(dx)f T (x)9 



which is equivalent to the condition 
fi(dx)f T (x) = [ f(x)^ T (dx) = 



for all 9 E 
(3.13) 



where I m denotes the to x m identity matrix. In the following discussion we 
define .Mo as a subset of M containing the signed measures which satisfy 
condition (3.13). Note that both sets, M and Mo, are convex. 

For a given vector c S M m , the variance of the estimate c T #(/i) is given by 



Var(c T #(^)) 



~E[e(x)£(u)](i(dx)[i T (du)c 



K(x, u)n{dx)ijL T {du)c = <& c (/u) 



and a minimizer of this expression with respect to ji € Mq determines the 
best linear unbiased estimate for c T 9 and the corresponding c-optimal design 
simultaneously. 

Note that the sets A4 and Mq are convex and in view of Lemma 3.3 
the functional $ c (m) defied in (3.12) is convex on Ai. Similar arguments as 
given in Section 3.1 show that the directional derivative of <3? c at fj,* in the 
direction of v — fj,* is given by 

d 



da 



$c(/U c 



da 



2c 1 



o=0 



$ c ((l -a)fj* + av) 



o=0 



K(x, u)n* {dx)v T (du) 



K(x,u)ii*(dx)n* T (du) 



c. 



Because $ c is convex, the optimality of \i* in the set .Mo is equivalent to the 
condition -§^<& c {Ha)\a=o > for all v € Mo- Therefore, the signed measure 
fj,* £ Mo minimizes the functional «J? c (/x) if and only if the inequality 



(3.14) 



$ c Gu» > $ c (/i*) 
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holds for all v G Mo, where 

$ c (p*,v) ■=<? ^ J K{ X) u)ii*(&x)v T {&u)c. 

Let us prove part (a) of Theorem 3.3. Consider a design £ G S such 
that g(x) = for all x G X and define the vector-valued measure //o(dx) = 
M- 1 (£)/(x)£(dx). It follows for all v G M , 

<f> c (fi Q ,v)=c T J I K{ Xj u)M-\i)f{x)^{dx)v T {du)c 

= c T M~ 1 (e)A J /( )1 )/(d!i)c = c T M" 1 (r)Ac ) 

where we used (3.13) for the measure v in the last identity. On the other 
hand, fio G Ai also satisfies (3.13), and we obtain once more using identity 
(3.6) with g{x) = 0, 



^ c (/Uo) = c T / / K(x,u)fi (dx)iJ,Q (dtt)c 



K(x,u)M- 1 (0/^)e(dx) 



(d«)c 



(FM-^A J f{u)nl{du)c = c T M- l {£)Ac. 



This yields that for fi* = fiQ we have equality in (3.14) for all v G .Mo, which 
shows that the vector-valued measure /io(dx) = M _1 (£)/(x)£(dx) minimizes 
the function $ c for any c / over the set .Mo of signed vector- valued mea- 
sures satisfying (3.13). 

Now we return to the minimization of the function D{rf) in the class of all 
designs rj G S. For any rj G H, define the corresponding vector- valued measure 
/^(dx) = M _1 (r/)/(x)r/(dx) and note that /i^ G .Mo- We obtain 

c^DfaJc = c t M- 1 (t ? )B(t ? , r ? )M- 1 (r ? )c = $ c (/i„) 

> min $ c (/i) = $ c (/i ) = c T D(£)c. 

Since the design £ does not depend on the particular vector c, it follows that 
£ is universally optimal. 

Let us now prove (b) of Theorem 3.3. Assume that the design £ is univer- 
sally optimal and let g(x) be the function associated with this design and 
computed by (3.6). 

Consider first the case m = 1. In this case, the assumption that £ is uni- 
versally optimal design coincides with the assumption of simple optimality. 
Also, since /(x) 7^ for all x G X, we can define 7(x) = g(x)/ f(x) for all 
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x G X. In this notation, the statement (b) of Theorem 3.3 coincides with the 
statement of Corollary 3.1. 

Assume now m > 1 . Since the design £ is universally optimal it is c-optimal 
for any vector c and therefore the necessary condition for c-optimality should 
be satisfied; this condition is 



for all x E X and r c (cc,£) = for all x in the support of the measure £. If 
g(x) = r y(x)f(x), where 7(3;) > 0, then 



for any vector c and all x so that the necessary condition for c-optimality 
is satisfied. On the other hand, if g(x) = r y(x)f(x), but 7(^0 ) < for some 
x G X then [in view of the fact that the matrix M- 1 (£)/(x )/ T (x )M- 1 (£) 
is nondegenerate] there exists c such that r c (xQ,^) < and the necessary 
condition for c-optimality of the design £ is not satisfied. 

Furthermore, if the representation g(x) = r y(x)f(x) does not hold, then 
there exists a point xq G X such that g(xo) 7^ and g(xo) is not proportional 
to f(xo) [recall also that f(x) ^ for all x G X]. Then 



with a = M _1 (£)^(xo) and b = M _1 (£)/(xo). Using Lemma 3.4, we deduce 
that there exists a vector c such that r c (xo,£) < 0. Therefore the design £ is 
not c-optimal and as a consequence also not universally optimal. □ 

In the one-parameter case (m = 1) it is easy to construct examples where 
the function g(x) corresponding to the optimal design is nonzero. For ex- 
ample, consider the regression model y{t) = 8t + s(t), t £ [—1, 1], with the 
so-called spherical correlation function 



with R = 2. Then the design assigning weights 0.5 to the points —1 and 1 is 
optimal. For this design, the function g{x) defined in (3.6) is equal to g(x) = 
x(l — x 2 )/16, while the function 7(2;) is j(x) = (1 — x 2 )/16, x G [—1, 1]. 

4. Optimal designs for specific kernels and models. 

4.1. Optimality and Mercer's theorem. In this section we consider the 
case when the regression functions are proportional to eigenfunctions from 
Mercer's theorem. To be precise, let X denote a compact subset of a metric 
space, and let v denote a measure on the corresponding Borel field with 
positive density. Consider the integral operator 



r c (x,0 = c T M- 1 {i)g{x)f T {x)M- 1 (i)c > 



r c {x,t) = 1 (x)[c T M- l {t)f(x)f T (x)M- l {t)c}>Q 



r c (x ,0 = c T M- 1 (i)g(x )f T (x Q )M- 1 {i)c = c T ab T c 



p(u) = l-3\ u \/R + i(\u\/Rf 



(4.1) 
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on L2{y\ Under certain assumptions on the kernel [e.g., if K(u,v) is sym- 
metric, continuous and positive definite] Tk defines a symmetric, compact 
self-adjoint operator. In this case Mercer's theorem [see, e.g., Kanwal (1997)] 
shows that there exist a countable number of eigenfunctions tpi, ip2, ■ ■ ■ with 
positive eigenvalues Ai, A2, • . • of the operator K, that is, 

(4.2) T k (<p t )=\m, £ = 1,2,.... 
The next statement follows directly from Theorem 3.3. 

Theorem 4.1. Let X be a compact subset of a metric space, and assume 
that the covariance kernel K{x,u) defines an integral operator Tg of the 
form (4-1), where the eigenfunctions satisfy (4-2). Consider the regression 
model (1.1) with f(x) = L(ifi 1 (x), . . . , l fi m (x)) T and the covariance kernel 
K{x,u), where L £ ]jj mxm j s a nonsingular matrix. Then the design v is 
universally optimal. 

We note that the Mercer expansion is known analytically for certain co- 
variance kernels. For example, if v is the uniform distribution on the inter- 
val X = [—1,1], and the covariance kernel is of exponential type, that is, 
K (x, u) = e - x \ x ~ u \ j then the eigenfunctions are given by 

fk{x) = sin(ujkx + kn/2), k £ N, 

where ui,u)2,--- are positive roots of the equation tan(2w) = — 2\u/(\ 2 — 
oj 2 ). Similarly, consider as a second example, the covariance kernel K(x,u) = 
min{x, u} and X = [0, 1], In this case, the eigenfunctions of the corresponding 
integral operator are given by 

>Pk(x) = sin ((A; + l/2)irx), k £ N. 

In the following subsection we provide a further example of the applica- 
tion of Mercer's theorem, which is of importance for series estimation in 
nonparametric regression. 

4.2. Uniform design for periodic covariance functions. Consider the re- 
gression functions 

(4.3) ^^ = {v^2cos(27r(j-l)x), if j' > 2' 

and the design space X = [0,1]. Linear models of the form (1.1) with re- 
gression functions (4.3) are widely applied in series estimation of a non- 
parametric regression function [see, e.g., Efromovich (1999, 2008) or Tsy- 
bakov (2009)]. Assume that the correlation function p{x) is periodic with 
period 1, that is, p(x) = p(x + 1), and let a covariance kernel be defined by 
K(u, v) = o~ 2 p{u — v) with <7 = 1. An example of the covariance kernel p(x) 
satisfying this property is provided by a convex combination of the functions 
{cos(27rx), COS 2 (27T2;), . . .}. 
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Theorem 4.2. Consider regression model (1.1) with regression func- 
tions /^(x), . . . , fi m ( x ) (1 < *i < • • • < i m ) defined in (4-3) and a correlation 
function p(t) that is periodic with period 1. Then the uniform design is uni- 
versally optimal. 

Proof. We will show that the identity 

(4.4) f K(u,v)f j (u)du= [ p{u-v)f j (u)du = X j f j (v) 
Jo Jo 

holds for all v G [0, 1], where Xj = f p(u)fj(u)du (j > 1). The assertion then 

follows from Theorem 4.1. 

To prove (4.4), we define Aj(v) = J p{u — v)fj(u)du which should be 

shown to be Xjfj(v). For j = 1 we have A\(v) = X\ because Jg 1 p(u — v ) du = 

Jq p(u) du = X\ by the periodicity of the function p(x). For j = 2, 3, . . . we 
note that 

pl pl—v 

A j( v )= P(u-v)fj(u)du= fj(u + v)p(u)du 

Jo J-v 
l-v r-0 

fj(u + v)p(u)du+ / fj(u + v)p(u)du. 

J —v 

Because of the periodicity we have 

f0 rl 

fj(u + v)p(u)du= / fj(u + v)p(u)du, 

) Jl-V 

which gives Aj(v) = fj(u + v)p{u) du. A simple calculation now shows 

(4.5) A'^v) = -b 2 j A j (v), 
where 6^ = (2n(j — l)) 2 and 

Aj(0) = y/2 f cos(27r(j - l)u)p{u) du = V2X j} 
Jo 

A'AO) = -bjV2 [ sin(2vr(j - \)u)p{u) du = 0. 
Jo 

Therefore (from the theory of differential equations) the unique solution of 

(4.5) is of the form Aj(y) = c\ cos(bjv) + C2 sin(bjv), where c\ and C2 are de- 
termined by initial conditions, that is, Aj(0) = c\ = y/2Xj,Aj(0) = bjC2 = 0. 

This yields Aj(v) = Xj^/2cos(2^r(j — l)v) = Xjfj(v) and proves identity (4.4). 

□ 

4.3. Optimal designs for the triangular covariance function. Let us now 
consider the triangular correlation function defined by 

(4.6) p(x) = max{0,l-A|x|}. 
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On the one hand this function arises as a correlation function of the process 
of increments of a Brownian motion which is in turn related to a Brownian 
bridge after a suitable conditioning is made; see Mehr and McFadden (1965). 
On the other hand it is motivated by the fact that for "small" values of the 
parameter A, it provides a good approximation of the exponential correlation 
kernel p\(x) = exp(— A|x|), which is widely used for modeling correlations in 
regression models; see Ucinski and Atkinson (2004) or Dette, Pepelyshev and 
Holland-Letz (2010), among others. For the exponential correlation kernel 
optimal designs are difficult to find, even in the linear regression model; see 
Dette, Kunert and Pepelyshev (2008). However, as the next theorem shows, 
it is possible to explicitly derive optimal designs for the linear model with a 
triangular correlation function. It will be demonstrated in Example 4.1 below 
that for "small" and "moderate" values of the parameter A, these designs 
provide also an efficient solution of the design problem for the exponential 
correlation kernel. 

Theorem 4.3. Consider model (1.1) with f(x) = (l,x) T , X = [-1,1] 
and the triangular correlation function (4-6). 

(a) If X £ (0,1/2], then the design £* = {— 1, 1; 1/2, 1/2} is universally 
optimal. 

(b) If XgN, then the design supported at 2A + 1 points x^ = — 1 + k/X, 
k = 0, 1, ... , 2A, with equal weights is universally optimal. 

Proof. To prove part (a) we will show that J p(x — u)fi(u)^*(du) = 
Xifi(x) for £ = 1,2 and some Ai and A2. By direct calculations we obtain for 



and, therefore, A2 = A. Thus, the assumptions of Theorem 3.3 are fulfilled. 
Part (b). Straightforward but tedious calculations show that M(£*) = 



diag(l, 7 ), where 7 = ^Xq 1 x 2 k /(2X + 1) = (A + 1)/(2A). Also we have 



for i = 1,2 where Ai = A2 = 1/(2A + 1). Thus, the assumptions of Theo- 
rem 3.3 are fulfilled. □ 



fi(x) = 1 




and, consequently, Ai = 1 — A. Similarly, we have for f2(x) = x, 
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Table 1 

D-Efficiencies of the universally optimal design £ = { — 1, 1; 
0.5, 0.5} calculated under the assumption of a triangular 
correlation function in the constant and linear regression model 
with the exponential correlation function p(x) = e^"* 1 ' 1 



A 


0.1 


0.3 


0.5 


0.7 


0.9 


Constant 
Linear 


0.999 
0.999 


0.997 
0.999 


0.978 
0.991 


0.946 
0.974 


0.905 
0.950 



The designs provided in Theorem 4.3 are also optimal for the location scale 
model; see Zhigljavsky, Dette and Pepelyshev (2010). However, unlike the 
results of previous subsections the result of Theorem 4.3 cannot be extended 
to polynomial models of higher order. 

We conclude this section with an example illustrating the efficiency of the 
designs for the triangular kernel in models with correlation structure defined 
by the exponential kernel. 

Example 4.1. Consider the location scale model [f(x) = 1] and the lin- 
ear regression model [f(x) = (l,x) T ], X = [—1, 1], and the correlation func- 
tion p(x) = exp{— A|x|}. In Table 1 we display the D-efficiencies of the uni- 
versally optimal design calculated under the assumption of the triangular 
kernel (4.6) for various values of the parameter A € [0.1, 0.9]. For this design 
we observe in all cases a D-efficiency of at least 90%. In most cases it is 
higher than 95%. 

4.4. Polynomial regression models and singular kernels. In this section 
we consider the polynomial regression model, that is, f(x) = (1, x, . . . , x m ~ l ) T , 
with logarithmic covariance kernel 

(4.7) K(u,v) =7-/31n(u-v) 2 , /?>0,7>0, 
and the kernel 

(4.8) K(u,v) = 1 + p/\u-v\ a , 0<a< l,7>0,/3>0, 

for which the universally optimal designs can be found explicitly. 

Covariance functions K(u, v) with a singularity at u = v appear natu- 
rally as approximations to many standard covariance functions K(u, v) = 
o~ 2 p(u — v) with p(0) = 1 if a 2 is large. A general scheme for this type of 
approximation is investigated in Zhigljavsky, Dette and Pepelyshev (2010), 
Section 4. More precisely, these authors discussed the case where the covari- 
ance kernel can be represented as a 2 p$(t) = r * hs(t) with a singular kernel 
r(t) and a smoothing kernel h$(-) (here 5 is a smoothing parameter and * de- 
notes the convolution operator) . The basic idea is illustrated in the following 
example. 
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-1 -0.5 0.5 1 

Fig. 4. The logarithmic covariance kernel r(t) = — ln(i) 2 and the covariance kernel (4-9), 
where 5 = 0.02,0.05,0.1. 

Example 4.2. Consider the covariance kernel K(u, v) = ps(u — v), where 
(4.9) „ s(i) = 2 __ log (L±J_| 

For several values of 5, the function p$ is displayed in Figure 4. A straight- 
forward calculation shows that ps(t) = r*hg(t), where r(t) = — ln(t) 2 and h$ 
is the density of the uniform distribution on the interval on [—6, 5] . As illus- 
trated by Figure 4, the function ps(-) is well approximated by the singular 
kernel r(-) if 5 is small. 

In Figure 5 we display the D-optimal designs (constructed numerically) 
for the quadratic model with a stationary error process with covariance 
kernel K(u,u + t) = ps(t), where ps is defined in (4.9) and 5 = 0.02,0.05,0.1. 
As one can see, for small 5 these designs are very close to the arcsine design, 
which is the D-optimal design for the quadratic model and the logarithmic 
kernel, as proved in Theorem 4.5 of the following section. 

3 



2 



1 





-1 1-1 1-1 1 

Fig. 5. Density functions corresponding to the D-optimal designs for the quadratic model 
with covariance kernel (4-9), where 5 = 0.02 (left), 5 = 0.05 (middle) and 5 = 0.1 (right). 
The y-axis corresponds to values of the density functions. The corresponding designs are 
obtained by (2.3), where a -1 is the distribution function corresponding to the displayed 
densities. The grey line corresponds to the arcsine density p(x) = — x 2 . 
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Table 2 

Efficiency of the arcsine design £ a for the quadratic 
model and the kernel (4-9) 

8 0.02 0.04 0.06 0.08 0.1 

Eff(£ a ) 0.998 0.978 0.966 0.949 0.936 



In Table 2 we show the efficiency of the arcsine distribution [obtained 
by maximizing det(D(£)) with the logarithmic kernel] in the quadratic re- 
gression model with the kernel (4.9). We observe a very high efficiency with 
respect to the D-optimality criterion. Even in the case 5 = 0.1 the efficiency 
is 93.6% and it converges quickly to 100% as 5 approaches 0. 



Example 4.3. The arcsine density can also be used as an alternative 
approximation to the exponential correlation function or correlation func- 
tions of a similar type, that is, px,u(t) = exp(— A|t| I/ ). For the case A = u = 1 
the function y^(l — ^ lni 2 ) can be considered as a reasonable approximation 
to exp(— \t\) on the interval [—1,1]; see the left part of Figure 6. Similarly, 
if A = 1, v = 1/4, it is illustrated in the right part of Figure 6 that the 
function | — ^ In t 2 provides a very accurate approximation of the exponen- 
tial correlation function. As a consequence, the arcsine design (optimal for 
the logarithmic kernel) will also have a high efficiency with respect to these 
kernels, and this argument is illustrated in Table 3 of Section 5.2 where 
we calculate the D-efficiencies of the arcsine design in polynomial regres- 
sion models with correlation function exp(— |i|). For the correlation function 
exp(— A|£| 1//4 ) a similar D-efficiency of the arcsine design can be observed. 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

Fig. 6. Left panel: the function ^(1 — ) (solid line) as an approximation of the 

exponential correlation function exp(—\t\) . Right panel: the function | — ^lnt 2 (solid line) 
as an approximation of the exponential correlation function exp(— lij 1 ' 4 ) (dashed line). 
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For example, if A = 0.5, 2.5 the D-emciencies of the arcsine design in the 
linear regression model are 100% and 96.9%, respectively, while they are 
99.9% and 97.1% in the quadratic model. Other choices of A and v yield 
similar results, which are not displayed for the sake of brevity. 

4.4.1. Optimality of the arcsine design. We will need the following lemma, 
which states a result in the theory of Fredholm-Volterra integral equations; 
see Mason and Handscomb (2003), Chapter 9, page 211. 

Lemma 4.1. The Chebyshev polynomials of the first kindT n {x) = cos(n x 
arccosx) are the eigenf unctions of the integral operator with the kernel 
H(x, v) = — ln(x — v) 2 j\Jl — v 2 . More precisely, for all n = 0, 1, . . . we have 
for allneN 

/l ^ 
T n (v)ln(x - v) 2 — , a; €[-1,1], 

-1 7TV 1 — V 

where \q = 2 In 2 and X n = 2/n for n > 1 . 

With the next result we address the problem of uniqueness of the op- 
timal design. In particular, we give a new characterization of the arcsine 
distribution. A proof can be found in the Appendix. 

Theorem 4.4. Let n be a nonnegative integer and £ be a random vari- 
able supported on the interval [—1,1]. Then the distribution of Q has the 
arcsine density (3.10) if and only if the equality 

ET n (()(-ln(C-x) 2 ) = c n T n (x) 

holds for almost all x G [—1,1], where c n = 2/n if n G N and cq = 2 In 2 if 
n = 0. 

The following result is an immediate consequence of Theorems 3.3 and 4.4. 

Theorem 4.5. Consider the polynomial regression model (1.1) with 
f(x) = (l,x,x 2 ,...,x m ~ 1 ) T , x E [—1,1], and the covariance kernel (4-7)- 
Then the arcsine design £ a with density (3.10) is the universally optimal 
design. 

Proof. We assume without loss of generality that f3 = 1 and consider 
the function p(x) = — lnx 2 + 7 with positive 7. From Lemma 4.1 we obtain 

J (— ln(u — x) 2 + ^)T n (u)p(u) du = — J \n(u — x) 2 T n (u)p(u) du 

= KT n (x) +jS n0 , 

where 5 xy denotes Kronecker's symbol and we have used the fact that 
J_, T n (u) /y/1 — v?du = whenever n > 1. This proves (3.6) for the arcsine 
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distribution and the vector t(x) = (Tq(x), . . . ,T m _i(x)) T where the function 
g(x) is equal to for all x. Now f(x) = (l,x, . . . ,x m ~ 1 ) T = Lt(x) for some 
nonsingular m x m matrix. Therefore (3.6) holds also for the vector f(x) 
with g(x) = (and a different matrix A). The statement of the theorem now 
follows from Theorems 3.3 and 4.4. □ 



4.4.2. Generalized arcsine designs. For a G (0, 1) consider the Gegen- 

bauer polynomials Cm\x) which are orthogonal with respect to the weight 
function 

(4T0) Pa (x) = (1 " * 2 t- 1)/2 , xGl-1,1]. 

For the choice a = the Gegenbauer polynomials Cm' {x) are proportional 
to the Chebyshev polynomials of the first kind T m (x). Throughout this paper 
we will call the corresponding beta-distributions generalized arcsine designs 
emphasizing the fact that the distribution is symmetric and the parame- 
ter a varies in the interval (0,1). The following result [from the theory of 
Fredholm-Volterra integral equations of the first kind with special kernel, see 
Fahmy, Abdou and Darwish (1999)] establishes an analogue of Lemma 4.1 
for the kernel 

(4 - n) HM = \»-vni-w-°)/> - 

Lemma 4.2. The Gegenbauer polynomials Cn 2 \x) are the eigenfunc- 
tions of the integral operator with the kernel defined in (4-11). More pre- 
cisely, for all n = 0, 1, . . . we have 

X n C^\x) = -t -^—C^^ 



_! \X-V\ a n ' ' (1 - W 2)(l-a)/2 

for all x e [-1, 1], where X n = ^$/$r(a)n\ " 

The following result generalizes Theorem 8 of Zhigljavsky, Dette and Pe- 
pelyshev (2010) from the case of a location scale model to polynomial re- 
gression models. 

Theorem 4.6. Consider the polynomial regression model (1.1) with 
f(x) = (l,x,x 2 ,...,x m ~ 1 ) T , x *E [—1,1], and covariance kernel (4-8). Then 
the design with generalized arcsine density defined in (4-10) is universally 
optimal. 



Proof. It is easy to see that the optimal design does not depend on /3 
and we thus assume that /3 = 1 in (4.8). 
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To prove the statement for the kernel p(x) = l/\x\ a + 7 with positive 7 
we recall the definition of p a in (4.10) and obtain from Lemma 4.2 

(j^^+~f)c { n a/2 Hu) Pa/ 2(u)du = J ^-^C^\u) Pa/2 (u)du 

cxC( Q / 2 )(x) 

for any n G N since f C^^ 2 ^ {u)p a / 2 (u) du = 0. Consider the design £ with 
density p a / 2 . For this design, the function g(x) defined in (3.6) is identically 
zero; this follows from the formula above. It now follows by the same argu- 
ments as given at the end of the proof of Theorem 4.5 that the design with 
density p a / 2 is universally optimal. □ 

5. Numerical construction of optimal designs. 

5.1. An algorithm for computing optimal designs. Numerical computa- 
tion of optimal designs for a common linear regression model (1.1) with 
given correlation function can be performed by an extension of the multi- 
plicative algorithm proposed by Dette, Pepelyshev and Zhigljavsky (2008) 
for the case of noncorrelated observations. Note that the proposed algorithm 
constructs a discrete design which can be considered as an approximation 
to a design which satisfies the necessary conditions of optimality of Theo- 
rem 3.1. By choosing a fine discretization {x±, . . . , x n } of the design space X 
and running the algorithm long enough, the accuracy of approximation can 
be made arbitrarily small (in the case when convergence is achieved). 

Denote by £W = {x±, . . . ,x n ]Wi \ . . . ,Wn ^} the design at the iteration r, 
where . . . ,Wn^ are nonzero weights, for example, uniform. We propose 



the following updating rule for the weights: 

i(r+ l) _ wf\i>{xU {r) )-Pr 



(5-1) < +1) = T^P Trh ^ < = 1 .-.". 



where f3 r is a tuning parameter [the only condition on f3 r is the positivity of 
all the weights in (5.1)], ij)(x,£) = (p(x,£)/b(x,£) and the functions ip(x,£) 
and b(x, £) are defined in (3.2) and (3.3), respectively. Condition (3.5) takes 
the form tp{x,^*) < 1 for all x G X. Rule (5.1) means that at the next iter- 
ation the weight of a point x = xj increases if condition (3.5) does not hold 
at this point. 

A measure is a fixed point of the iteration (5.1) if and only if ip(x, £*) = 1 
for all x £ supp(£*) and ijj(x,£,*) < 1 for all x G ^\supp(^*). That is, a design 

is a fixed point of the iteration (5.1) if and only if it satisfies the opti- 
mality condition of Theorem 3.1. We were not able to theoretically prove 
the convergence of iterations (5.1) to the design satisfying the optimality 
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condition of Theorem 3.1, but we observed this convergence in all numeri- 
cal studies. In particular, for the cases where we could derive the optimal 
designs explicitly, we observed convergence of the algorithm to the optimal 
design. 

Algorithm (5.1) can be easily extended to cover the case of singular co- 
variance kernels. Alternatively, a singular kernel can be approximated by 
a nonsingular one using the technique described in Zhigljavsky, Dette and 
Pepelyshev (2010), Section 4. 

5.2. Efficiencies of the uniform and arcsine densities. In the present 
section we numerically study the efficiency (with respect to the D-optimality 
criterion) of the uniform and arcsine designs for the polynomial model (1.1) 
with f(x) = (1, x, . . . , x m ~ 1 ) T and the exponential correlation function p{t) = 
e~ A l*l, t € [—1,1]. We determine the efficiency of a design £ as 




where £* is the design computed by the algorithm described in the previous 
section (applied to the D-optimality criterion). The results are depicted in 
Table 3. We observe that the efficiency of the arcsine design is always higher 
than the efficiency of the uniform design. Moreover, the absolute difference 
between the efficiencies of the two designs increases as m increases. On the 
other hand, in most cases the efficiency of the uniform design and the arcsine 
design decreases as m increases. 

6. Conclusions. In this paper we have addressed the problem of con- 
structing optimal designs for least squares estimation in regression models 
with correlated observations. The main challenge in problems of this type 

Table 3 

Efficiencies of the uniform design £ u and the arcsine design £ a 
for the polynomial regression model of degree m — 1 and 
the exponential correlation function p(x) = e""* 1 ' 1 







A 


0.5 


1.5 


2.5 


3.5 


4.5 


5.5 


171 


= 1 


Eff(&.) 


0.913 


0.888 


0.903 


0.919 


0.933 


0.944 






Eff(£ a ) 


0.966 


0.979 


0.987 


0.980 


0.968 


0.954 


111 


= 2 


Eff(£ tt ) 


0.857 


0.832 


0.847 


0.867 


0.886 


0.901 






Eff(U 


0.942 


0.954 


0.970 


0.975 


0.973 


0.966 


m 


= 3 


Eff(£ u ) 


0.832 


0.816 


0.826 


0.842 


0.860 


0.876 






Eff(&) 


0.934 


0.938 


0.954 


0.968 


0.976 


0.981 


m 


= 4 


Eff(£ u ) 


0.826 


0.818 


0.823 


0.835 


0.849 


0.864 






Eff(ea) 


0.934 


0.936 


0.945 


0.957 


0.967 


0.975 
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is that — in contrast to "classical" optimal design theory for uncorrelated 
data — the corresponding optimality criteria are not convex (except for the 
location scale model) . By relating the design problem to an integral operator 
problem, universally optimal design can be identified explicitly for a broad 
class of regression models and correlation structures. Particular attention is 
paid to a trigonometric regression model involving only cosines terms, where 
it is proved that the uniform distribution is universally optimal for any pe- 
riodic kernel of the form K(u,v) = p(u — v). For the classical polynomial 
regression model with a covariance kernel given by the logarithmic potential 
it is proved that the arcsine distribution is universally optimal. Moreover, 
optimal designs are derived for several other regression models. 

So far optimal designs for regression models with correlated observations 
have only be derived explicitly for the location scale model, and to our 
best knowledge the results presented in this paper provide the first explicit 
solutions to this type of problem for a general class of models with more 
than one parameter. 

We have concentrated on the construction of optimal designs for least 
squares estimation (LSE) because the best linear unbiased estimator (BLUE) 
requires the knowledge of the correlation matrix. While the BLUE is often 
sensitive with respect to misspecification of the correlation structure, the 
corresponding optimal designs for the LSE show a remarkable robustness. 
Moreover, the difference between BLUE and LSE is often surprisingly small, 
and in many cases BLUE and LSE with certain correlation functions are 
asymptotically equivalent; see Rao (1967), Kruskal (1968). 

Indeed, consider the location scale model y(x) = 9 + e(x) with K(u, v) = 
p(u — v), where the knowledge of a full trajectory of a process y(x) is avail- 
able. Define the (linear unbiased) estimate 9(G) = f y(x)dG(x), where G(x) 
is a distribution function of a signed probability measure. A celebrated re- 
sult of Grenander (1950) states that the "estimator" 9(G*) is BLUE if and 
only if J p(u — x) dG*(u) is constant for all x G X. This result was extended 
by Neither [(1985a), Section 4.3], to the case of random fields with con- 
stant mean. Consequently, if G*(x) is a distribution function of a nonsigned 
(rather than signed) probability measure, then LSE coincides with BLUE 
and an asymptotic optimal design for LSE is also an asymptotic optimal de- 
sign for BLUE. Hajek (1956) proved that G* is a distribution function of a 
nonsigned probability measure if the correlation function p is convex on the 
interval (0,oo). Zhigljavsky, Dette and Pepelyshev (2010) showed that G* is 
a proper distribution function for a certain families of correlation functions 
including nonconvex ones. 

In Theorem 3.3 we have characterized the cases where there exist uni- 
versally optimal designs for ordinary least squares estimation. Specifically, 
a design £* is universally optimal for least squares estimation if and only if 
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condition (3.6) with g(x) = is satisfied. Moreover, the proof of Theorem 3.3 
shows that in this case the signed vector-valued measure 

and the LSE minimizes (with respect to the Loewner ordering) the matrix 

K(x, u) fi(dx) fj 7 (du) 



II 



in the space M of all vector- valued signed measures. Because this matrix 
is the covariance of the linear estimate f y(x)fi(dx) (where fi is a vector of 
signed measures) it follows that under the assumptions of Theorem 3.3, the 
LSE combined with the universally optimal design £* give exactly the same 
asymptotic covariance matrix as the BLUE and the optimal design for the 
BLUE. 

APPENDIX: SOME TECHNICAL DETAILS 

Proof of Lemma 3.3. For any c G W n and /i G M we set v(-) = c T ^(-), 
where v{dx) is a signed measure on X . Then the functional 



$ c (^) = c J J K(x,u)fi(dx)fi 1 (du)c 

can also be written as 

$ c (/i) = ^(v) = J J K{x,u)v(dx)v(du). 

For any a G [0, 1] and any two signed measures vq and v\ on X we have 
^{av + (1 - a)ui) 

K(u,v)[ai>o(du) + (1 — a)h>i(du)][aiy(j(dv) + (1 — a)vi(dv)] 



= a J J K(u, v)vo(du)vo(dv) + (1 — a) J J K{u,v)v\{du)vi(dv) 

+ 2a(l — a) J J K(u,v)vQ(du)vi{dv) 

= a 2 ^(v ) + (1 - a) 2 ^^) + 2a(l - a) J J K(u,v)v (du)i;i(dv) 
= a^(^o) + (1 - ot)^(ui) - a(l - a) A, 

where 

A = f j K(u,v)[vo(du)uo(dv) + vi(du)vi(dv) — 2uQ(du)v\(dv)] 
K(u,v)((du)((dv) 
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and C(dii) = z^o(dii) — ui(du). In view of (2.9), we have A > and therefore 
the functional ^f(-) is convex. □ 

Proof of Lemma 3.4. As vectors a and b are linearly independent, we 
have a T a > 0, b T b > and (a') T b' < 1, where a' = a/Va T a and b' = b/Vb T b. 
For any vector c G K m , we can represent 5 C as 



5 C = c T o6 T c = Va T a • 6 T 6 • c T a' • c T 6'. 
With the choice c = a' — b' it follows 

c T a' = l-{a'fb' >0 and c r 6' = (a') T b' — 1 < 
implying S c < 0. □ 

Proof of Theorem 4.4. Note that the part "if" of the statement fol- 
lows from Lemma 4.1, and we should prove the part "only if." Nevertheless, 
we provide a proof of the part "if" since it will be the base for proving the 
part "only if." 

Since the statement for n = is proved in Schmidt and Zhigljavsky (2009), 
we consider the case n E N in the rest of proof. Using the transformation 
(p = arccosii and tp = arccosx, we obtain T n (cos(p) = cos(n<^) and 

1 ln(u-x) 2 ln(cosy-x) 2 

l n [u)du = / cos (nip) sin <pd(p. 

i 7rVl - u 2 Jo TTsmip 

Consequently, in order to prove Theorem 4.4 we have to show that the 
function 

fK 

ln(cos <p — cos ip) 2 cos(ro</?)/i(dc/?) 

is proportional to cos{nip) if and only if /i has a uniform density on the in- 
terval [0,7r]. Extending /i to the interval [0,2-7r] as a symmetric (with respect 
to the center ir) measure, n{A) = fj,(2n — A), and defining the measure /x as 
jl(A) = n(2A)/2 for all Borel sets A G [0,7r], we obtain 

ln(cos f — cos ip) 2 cos(rnp) fi(dip) 



2tt 







cos(wp)ln(cos(p — cos ip) ^(dtp) 
/ ,„ „/. ,„ i „/,\ 

2 



1 I \ 1 f ' ■ V + ^V (A ^ 



2k 

cos(n(f) ln2 2 fi(d(p) 



o 
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1 f 27T ( i/jN 2 

+ - / cos^lnUn— J n(dip) 



\J cos(?i^)ln^sin^^^ (j,(d<p) 
cos(2nip) In sin 2 (ip — ip/2)fl(dp) 

rn 

+ / cos(2n(p)lnsin 2 (ip + ^/2)jl (dip) 
Jo 

cos(2nip — nip + nip) lnsin 2 ((/? — ip/2)fl(d(p) 

rn 

2cos(nip) / cos(2n<p — nip) In sin 2 (93 — ip/2)p,(d(p) 
Jo 

f'TT 

+ 2sin(m/?) I sin(2ncp — nip) lnsin 2 (</? — ip/2)jl(d(p). 



+ 2 



The "if" part follows from the facts that the functions cos(2?iz) lnsin 2 (z) 
and sin(2nz) In sin 2 (2:) are 7r-periodic and 

f n dp f 77 dp 

/ sm(2nip — nip) lnsin 2 ((/? — ip/2) — = / sm(2np>) lnsin 2 (y?) — = 0, 

Jo 11 Jo 



• 2/ , /o^ d( ^ r m M ■ 2, \ dl P 



cos(2ntp — nip) lnsin (p — ip/2) — = / cos(2n9?) lnsin (p) — = —l/n. 

^ Jo ™ 

To prove the "only if" part we need to show that the convolution of 

cos(2nz) lnsin 2 (2;) and fl(z), that is, 



/*7T 

/ cos(2n(p — t)) lnsin 2 (ip — t)jl(dp) 
Jo 



is constant for almost all t S [0,vr] if and only if \x is uniform; and the 
same holds for the convolution of sin(2nz) In sin 2 (z) and jx(z). This, however, 
follows from Schmidt and Zhigljavsky [(2009), Lemma 3] since 
cos(2nz) lnsin 2 (z) £ L 2 ([0,7r]), and all complex Fourier coefficients of these 
functions are nonzero. Indeed, 

/ cos(2nt)lnsin 2 (t)sin(2A;t)dt = V/c € Z, 
Jo 

I cos(2nt) lnsin 2 (t) cos(2fci) dt = (7| n+ / c | + 7| n _fc|)/2 Vk G Z, 

where 70 = — 27rlog2 and 7^. = —ir/k for A; £ N; see formula 4.384.3 in Grad- 
shteyn and Ryzhik (1965). □ 



OPTIMAL DESIGN FOR CORRELATED OBSERVATIONS 



33 



Acknowledgments. Parts of this paper were written during a visit of the 
authors at the Isaac Newton Institute, Cambridge, UK, and the authors 
would like to thank the institute for its hospitality and financial support. 
We are also grateful to the referees and the Associate Editor for their con- 
structive comments on earlier versions of this manuscript. 

REFERENCES 

Bickel, P. J. and Herzberg, A. M. (1979). Robustness of design against autocorrelation 
in time. I. Asymptotic theory, optimality for location and linear regression. Ann. Statist. 
7 77-95. MR0515685 

Bickel, P. J., Herzberg, A. M. and Schilling, M. F. (1981). Robustness of design 
against autocorrelation in time. II. Optimality, theoretical and numerical results for the 
first-order autoregressive process. J. Amer. Statist. Assoc. 76 870-877. MR0650898 

Boltze, L. and Nather, W. (1982). On effective observation methods in regression 
models with correlated errors. Math. Operationsforsch. Statist. Ser. Statist. 13 507- 
519. MR0682028 

Dette, H., Kunert, J. and Pepelyshev, A. (2008). Exact optimal designs for weighted 
least squares analysis with correlated errors. Statist. Sinica 18 135-154. MR2384982 

Dette, H., Pepelyshev, A. and Zhigljavsky, A. (2008). Improving updating rules 
in multiplicative algorithms for computing D-optimal designs. Cornput. Statist. Data 
Anal. 53 312-320. MR2649087 

Dette, H., Pepelyshev, A. and Holland-Letz, T. (2010). Optimal designs for random 
effect models with correlated errors with applications in population pharmacokinetics. 
Ann. Appl. Stat. 4 1430-1450. MR2758335 

Efromovich, S. (1999). Nonparametric Curve Estimation: Methods. Theory, and Appli- 
cations. Springer, New York. MR1705298 

Efromovich, S. (2008). Optimal sequential design in a controlled non-parametric regres- 
sion. Scand. J. Stat. 35 266-285. MR2418740 

Fahmy, M. H., Abdou, M. A. and Darwish, M. A. (1999). Integral equations and 
potential-theoretic type integrals of orthogonal polynomials. J. Cornput. Appl. Math. 
106 245-254. MR1696409 

Gradshteyn, I. S. and Ryzhik, I. M. (1965). Table of Integrals, Series, and Products. 
Academic Press, New York. MR0197789 

Grenander, U. (1950). Stochastic processes and statistical inference. Ark. Mat. 1 195- 
277. MR0039202 

Hajek, J. (1956). Linear estimation of the mean value of a stationary random process 
with convex correlation function. Czechoslovak Math. J. 6 94-117. MR0080399 

Harman, R. and Stulajter, F. (2010). Optimal prediction designs in finite discrete 
spectrum linear regression models. Metrika 72 281-294. MR2725102 

Kanwal, R. P. (1997). Linear Integral Equations, 2nd ed. Birkhauser, Boston, MA. 
MR1427946 

Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). 

Ann. Statist. 2 849-879. MR0356386 
Kiefer, J. and Wolfowitz, J. (1960). The equivalence of two extremum problems. 

Canad. J. Math. 12 363-366. MR0117842 
Kiselak, J. and Stehli'k, M. (2008). Equidistant and D-optimal designs for parameters 

of Ornstein-Uhlenbeck process. Statist. Probab. Lett. 78 1388-1396. MR2453793 
Kruskal, W. (1968). When are Gauss-Markov and least squares estimators identical? 

A coordinate-free approach. Ann. Math. Statist. 39 70-75. MR0222998 



34 



H. DETTE, A. PEPELYSHEV AND A. ZHIGLJAVSKY 



Mason, J. C. and Handscomb, D. C. (2003). Chebyshev Polynomials. Chapman & 
Hall/CRC, Boca Raton, FL. MR1937591 

Mehr, C. B. and McFadden, J. A. (1965). Certain properties of Gaussian processes and 
their first-passage times. J. R. Stat. Soc. Ser. B Stat. Methodol. 27 505-522. MR0199885 

Muller, W. G. and Pazman, A. (2003). Measures for designs in experiments with cor- 
related errors. Biometrika 90 423-434. MR1986657 

Nather, W. (1985a). Effective Observation of Random Fields. Teubner-Texte zur Math- 
ematik [Teubner Texts in Mathematics] 72. Teubner, Leipzig. MR0863287 

Nather, W. (1985b). Exact design for regression models with correlated errors. Statistics 
16 479-484. MR0803486 

Pazman, A. and Muller, W. G. (2001). Optimal design of experiments subject to cor- 
related errors. Statist. Probab. Lett. 52 29-34. MR1820047 

Pukelsheim, F. (2006). Optimal Design of Experiments. Classics in Applied Mathematics 
50. SIAM, Philadelphia, PA. Reprint of the 1993 original. MR2224698 

Rao, C. R. (1967). Least squares theory using an estimated dispersion matrix and its 
application to measurement of signals. In Proc. Fifth Berkeley Sympos. Math. Statist, 
and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 355-372. Univ. California 
Press, Berkeley, CA. MR0212930 

Sacks, J. and Ylvisaker, N. D. (1966). Designs for regression problems with correlated 
errors. Ann. Math. Statist. 37 66-89. MR0192601 

Sacks, J. and Ylvisaker, D. (1968). Designs for regression problems with correlated 
errors; many parameters. Ann. Math. Statist. 39 49-69. MR0220424 

Schmidt, K. M. and Zhigljavsky, A. (2009). A characterization of the arcsine distri- 
bution. Statist. Probab. Lett. 79 2451-2455. MR2556310 

Torsney, B. (1986). Moment inequalities via optimal design theory. Linear Algebra Appl. 
82 237-253. MR0858975 

Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. 
Revised and extended from the 2004 French original, Translated by Vladimir Zaiats. 
MR2724359 

Ucinski, D. and Atkinson, A. (2004). Experimental design for time-dependent models 
with correlated observations. Stud. Nonlinear Dyn. Econom. 8 13. 

Zhigljavsky, A., Dette, H. and Pepelyshev, A. (2010). A new approach to optimal 
design for linear models with correlated observations. J. Amer. Statist. Assoc. 105 
1093-1103. MR2752605 



H. Dette 

Fakultat fur Mathematik 
Ruhr-Universitat Bochum 
Bochum, 44780 
Germany 

E-MAIL: holger.dette@rub.dc 



A. Pepelyshev 
Institute of Statistics 
RWTH Aachen University 
Aachen, 52056 
Germany 

E-mail: pcpclyshev@stochastik.rwth-aachen.de 

A. Zhigljavsky 

School of Mathematics 

Cardiff University 

Cardiff, CF24 4AG 

United Kingdom 

E-MAIL: ZhigljavskyAA@cf.ac.uk 



